Mach1 for Nonuniform Time-scale Modification of Speech: Theory, Technique, and Comparisons
نویسندگان
چکیده
We propose a new approach to nonuniform time compression, called Mach1, designed to mimic the natural timing of fast speech. At identical overall compression rates, listener comprehension for Mach1-compressed speech increased between 5 and 31 percentage points 2 over that for linearly compressed speech, and response times dropped by 15%. For rates between 2.5 and 4.2 times real time, there was no significant comprehension loss with increasing Mach1 compression rates. In A–B preference tests, Mach1-compressed speech was chosen 95% of the time. This paper describes the Mach1 technique and our listener-test results. Audio examples can be found on http://www.interval.com/papers/ 1997-061/. The research described in this paper is the basis for our submission to the 1998 International Conference on Acoustics, Speech, and Signal Processing. The description provided here is a longer and more complete description of our approach and our results than we could fit into the ICASSP paper format. However, since our ICASSP submission is effectively a subset of that description, we have included the IEEE copyright notice below. Interval Research Corporation Technical Report # 1997-061 Copyright 1998 IEEE. Published in the Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, May 12-15, 1998. Seattle, Washington. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966.
منابع مشابه
MACH1: nonuniform time-scale modification of speech
Time-compression techniques change the playback rate of speech without introducing pitch artifacts. However, when linear-compression techniques are used, human comprehension of time-compressed speech typically degrades at compression rates above two times real time [1]. These degradations are not due to the speech rate per se: Comprehension of linearly compressed speech often breaks down above ...
متن کاملFree Vibration Analysis of Nonuniform Microbeams Based on Modified Couple Stress Theory: an Analytical Solution
In this study, analytical solution is presented to calculate the free vibration frequencies of nonuniform microbeams. Scale effects are modelled using modified couple stress theory and the microbeam is assumed to be thin while Poisson's ratio effects are also taken into account. Nonuniformity is presented by exponentially varying width among the microbeam while the thickness remains constant. R...
متن کاملModification of Audible and Visual Speech
Speech is one of the most common and richest methods that people use to communicate with one another. Our facility with this communication form makes speech a good interface for communicating with or via computers. At the same time, our familiarity with speech makes it difficult to generate synthetic but naturalsounding speech and synthetic but natural-looking lip-synced faces. One way to reduc...
متن کاملAn Overlap-add Technique Based on Waveform Similarity (wsola) for High Quality Time-scale Modification of Speech
A concept of waveform similarity is proposed for tackling the problem of time-scale modification of speech, and is worked-out in the context of short-time Fourier transform representations. The resulting WSOLA algorithm produces high quality speech output, is algorithmically and computationally efficient and robust, and allows for on-line processing with arbitrary timescaling factors that may b...
متن کاملEffects of Pitch Contours Stylization and Time Scale Modification on Natural Speech Synthesis
This paper describes the method of generation of intonated speech for natural speech synthesis using prosody generation model. The effect of pitch modification through pitch contour stylization for parameter extraction and time scale modification for it’s implementation has been mentioned. An approach for close-copy syllabic stylization has been described. In the latter part, algorithm for impl...
متن کامل